Entity Resolution and Federated Learning get a Federated Resolution
نویسندگان
چکیده
Consider two data providers, each maintaining records of different feature sets about common entities. They aim to learn a linear model over the whole set of features. This problem of federated learning over vertically partitioned data includes a crucial upstream issue: entity resolution, i.e. finding the correspondence between the rows of the datasets. It is well known that entity resolution, just like learning, is mistake-prone in the real world. Despite the importance of the problem, there has been no formal assessment of how errors in entity resolution impact learning. In this paper, we provide a thorough answer to this question, answering how optimal classifiers, empirical losses, margins and generalisation abilities are affected. While our answer spans a wide set of losses — going beyond proper, convex, or classification calibrated —, it brings simple practical arguments to upgrade entity resolution as a preprocessing step to learning. As an example, we modify a simple token-based entity resolution algorithm so that it aims at avoiding matching rows belonging to different classes, and perform experiments in the setting where entity resolution relies on noisy data, which is very relevant to real world domains. Notably, our approach covers the case where one peer does not have classes, or a noisy record of classes. Experiments display that using the class information during entity resolution can buy significant uplift for learning at little expense from the complexity standpoint.
منابع مشابه
Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption
Consider two data providers, each maintaining private records of different feature sets about common entities. They aim to learn a linear model jointly in a federated setting, namely, data is local and a shared model is trained from locally computed updates. In contrast with most work on distributed learning, in this scenario (i) data is split vertically, i.e. by features, (ii) only one data pr...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملThe Role of Asserted Resolution in Entity Identity Information Management
This paper introduces the concept of asserted resolution as a technique for entity resolution. In asserted resolution trusted information sources are used to force the equivalence (or non-equivalence) of entity references and identity structures regardless of matching conditions. The paper proposes five specific forms of assertion to support entity identity information management, the process o...
متن کاملA Negotiation Process Approach for Building Federated Databases
The negotiation process is often referred to in the literature on federated databases, but is seldom covered in depth. This process is essential to determine data of the component schema to be integrated for building a federated schema and the access permissions to be granted. This paper presents our negotiation process approach which is incorporated in the integration schemas mechanism, so we ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018